KenSwQuAD—A Question Answering Dataset for Swahili Low-resource Language
نویسندگان
چکیده
The need for question-answering (QA) datasets in low-resource languages is the motivation of this research, leading to development Kencorpus Swahili Question Answering Dataset (KenSwQuAD). This dataset annotated from raw story texts Swahili, a language that predominantly spoken eastern Africa and other parts world. Question-answering are important machine comprehension natural tasks such as internet search dialog systems. Machine learning systems training data gold-standard set developed research. research engaged annotators formulate QA pairs collected by project, Kenyan corpus. project 1,445 total 2,585 with at least 5 each, resulting final 7,526 pairs. A quality assurance 12.5% confirmed were all correctly annotated. proof concept on applying task can be usable tasks. KenSwQuAD has also contributed resourcing language.
منابع مشابه
Resource Analysis for Question Answering
This paper attempts to analyze and bound the utility of various structured and unstructured resources in Question Answering, independent of a specific system or component. We quantify the degree to which gazetteers, web resources, encyclopedia, web documents and web-based query expansion can help Question Answering in general and specific question types in particular. Depending on which resourc...
متن کاملSQuAD Question Answering Dataset: CS224N Assn 4
We solve the contextual question answering problem, which is an essential part in many automated question-answering datasets. Recently the SQuAD dataset [1] was uploaded and there were several deep learning approaches proposed to solve this. We implement a modified version of one of them, the Dynamic Coattention model as well as simple baseline.
متن کاملQuestion Answering on the SQuAD Dataset
We develop a deep learning framework for question answering on the Stanford Question Answering Dataset (SQuAD), blending ideas from existing state-of-theart models to achieve results that surpass the original logistic regression baselines. Using a dynamic coattention encoder and an LSTM decoder, we achieved an F1 score of 55.9% on the hidden SQuAD test set. In this paper, we present the methodo...
متن کاملLanguage Independent Passage Retrieval for Question Answering
Passage Retrieval (PR) is typically used as the first step in current Question Answering (QA) systems. Most methods are based on the vector space model allowing the finding of relevant passages for general user needs, but failing on selecting pertinent passages for specific user questions. This paper describes a simple PR method specially suited for the QA task. This method considers the struct...
متن کاملInvestigating Embedded Question Reuse in Question Answering
The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing
سال: 2023
ISSN: ['2375-4699', '2375-4702']
DOI: https://doi.org/10.1145/3578553